Less is more: towards an optimal universal description of protein folds
نویسندگان
چکیده
MOTIVATION Identification and characterization of protein structure regularities can reveal the mechanisms governing protein structure, function and evolution. Here we focus on an intermediate level of regularity. We have developed automated methods to systematically construct a dictionary of supersecondary structures that can be used as 'protein parts' to describe fold-sized structures. RESULTS The dictionary was constructed by aligning representative structures of all known folds, clustering similar substructures and selecting the most descriptive substructures in a minimum description length fashion. We show that the dictionary is compact and descriptive, capable of describing a substantial fraction of all known protein folds. We performed simulations using independent sets of training and testing folds. Dictionaries generated using the training set had high coverage over the folds in the testing set, suggesting that dictionary entries reflect general features of protein structures and should be capable of describing novel protein folds.
منابع مشابه
Evaluation of immunogenicity of recombinant influenza nucleoprotein (NP) for universal vaccine
Background: Influenza vaccines based on conserved proteins are being developed persistently. The conserved protein vaccines based on Nucleoprotein (NP) are highly protected vaccines against influenza viruses that can be used as a Universal vaccine. Aluminum hydroxide (Alum) is the most common adjuvant used in vaccine formulation to improve immunization by altering the epitopes’ folds. However, ...
متن کاملPriority Setting for Universal Health Coverage: We Need Evidence-Informed Deliberative Processes, Not Just More Evidence on Cost-Effectiveness
Priority setting of health interventions is generally considered as a valuable approach to support low- and middle-income countries (LMICs) in their strive for universal health coverage (UHC). However, present initiatives on priority setting are mainly geared towards the development of more cost-effectiveness information, and this evidence does not sufficiently support countries to make optimal...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملImplementation Research: An Efficient and Effective Tool to Accelerate Universal Health Coverage
Success in the implementation of evidence-based interventions (EBIs) in different settings has had variable success. Implementation research offers the approach needed to understand the variability of health outcomes from implementation strategies in different settings and why interventions were successful in some countries and failed in others. When mastered and embedd...
متن کاملA twist on folding: Predicting optimal sequences and optimal folds of simple protein models with the hidden-force algorithm
We propose a new way of looking at global optimization of off-lattice protein models. We present a dual optimization concept of predicting optimal sequences as well as optimal folds. We validate the utility of the recently introduced hidden-force Monte Carlo optimization algorithm by finding significantly lower energy folds for minimalist protein models than previously reported. Further, we als...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 Suppl 2 شماره
صفحات -
تاریخ انتشار 2005